Evaluating resource acquisition tools for Information Extraction

نویسندگان

  • Thierry Poibeau
  • Dominique Dutoit
  • Sophie Bizouard
چکیده

This paper evaluates two different approaches for the elaboration of semantic classes. The framework is an Information Extraction, which needs large amount of domain-dependent resources. An endogenous approach (corpus-based learning) is contrasted with a heterogeneous one (the use of a large semantic network). The two techniques are evaluated. Cet article vise à évaluer deux approches différentes pour la constitution de classes sémantiques. Nous nous plaçons dans la perspective d’une application d’extraction d’information, pour laquelle la notion de classe sémantique est primordiale. Une approche endogène (acquisition à partir d’un corpus) est contrastée avec une approche exogène (à travers un réseau sémantique riche). L’article présente une évaluation fine de ces deux techniques et leur complémentarité possible.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

What You Seek Is What You Get: Extraction of Class Attributes from Query Logs

Within the larger area of automatic acquisition of knowledge from the Web, we introduce a method for extracting relevant attributes, or quantifiable properties, for various classes of objects. The method extracts attributes such as capital city and President for the class Country, or cost, manufacturer and side effects for the classDrug, without relying on any expensive language resources or co...

متن کامل

A New Method for Improving Computational Cost of Open Information Extraction Systems Using Log-Linear Model

Information extraction (IE) is a process of automatically providing a structured representation from an unstructured or semi-structured text. It is a long-standing challenge in natural language processing (NLP) which has been intensified by the increased volume of information and heterogeneity, and non-structured form of it. One of the core information extraction tasks is relation extraction wh...

متن کامل

Syntactic Annotation of Large Corpora in STEVIN

The construction of a 500-million-word reference corpus of written Dutch has been identified as one of the priorities in the Dutch/Flemish STEVIN programme. For part of this corpus, manually corrected syntactic annotations will be provided. The paper presents the background of the syntactic annotation efforts, the Alpino parser which is used as an important tool for constructing the syntactic a...

متن کامل

Report on the Fourth International Conference on Knowledge Capture (K-CAP 2007)

The conference was held at The Fairmont Chateau in Whistler. Whistler is a spectacular setting, and is one of the principal sites of the 2010 Winter Olympic Games. Views from the Conference hotel were breathtaking and many of the participants took advantage of the venue to participate in various forms of outdoor sports. • Mixed-initiative planning & decision-support tools • Acquisition of probl...

متن کامل

Benchmarking ontology-based annotation tools for the Semantic Web

This paper discusses and explores the main issues for evaluating ontology-based annotation tools, a key component in text mining applications for the Semantic Web. Semantic annotation and ontologybased information extraction technologies form the cornerstone of such applications. There has been a great deal of work in the last decade on evaluating traditional information extraction (IE) systems...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2002